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In  many  applications,  ranging  from  character  recognition  to  signal  de- 
tection to  automatic  target  identification,  the  problem  of  signal  classification  is 
of  interest.  Often,  for  example,  a  signal  is  known  to  belong  to  one  of  a  family  of 
sets  C\ , . . . ,  Cn  and  the  goal  is  to  classify  the  signal  according  to  the  set  to  which 
it  belongs.  The  main  purpose  of  this  thesis  is  to  show  that  under  certain  condi- 
tions placed  on  the  sets,  the  theory  of  uniform  approximation  can  be  applied  to 
solve  this  problem.  Specifically,  if  we  assume  that  sets  Cj  are  compact  subsets  of 
a  normed  linear  space,  several  approaches  using  the  Stone- Weierstrass  theorem 
give  us  a  specific  structure  for  classification.  This  structure  is  a  single  hidden 
layer  feedforward  neural  network.  We  then  discuss  the  functions  which  comprise 
the  elements  of  this  neural  network  and  give  an  example  of  an  application. 
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1.  Signal  Classification 

Signal  classification  is,  quite  simply,  the  process  of  examining  a  signal 
and  determining  a  class,  or  group,  from  which  it  came.  Humans  perform  many 
instances  of  signal  classification  each  day,  often  without  even  knowing  it.  For 
example,  one  might  read  a  signature  (the  signal)  carefully  to  determine  the 
author  (the  class).  This  might  be  a  process  that  would  be  extremely  hard  for  a 
computer  to  perform. 

There  are  numerous  applications  in  military,  civilian,  and  academic  prob- 
lems that  require  the  use  of  the  field  of  signal  classification.  It  would  be  fruitless 
to  attempt  to  compile  an  exhaustive  list  of  applications,  so  we  will  state  and 
develop  a  few  problems  here  in  which  the  theory  of  signal  classification  plays 
an  important  role  in  the  solution. 

Automatic  Target  Recognition 

The  field  of  automatic  target  recognition  is  extremely  important,  primar- 
ily in  the  area  of  the  military.  The  main  purpose  of  automatic  target  recognition 
is  the  use  of  computer  processing  to  detect  and  recognize  signatures  in  sensor 
data  [1].  These  targets  are  most  often  in  a  cluttered  environment  and  frequently 
in  hostile  territory.  They  may  include  such  things  as  aircraft,  missiles,  tanks, 
or  warships.  The  clutter  in  their  background  may  come  from  temperature  or 
pressure  disturbances,  atmospheric  variations,  topographical  objects,  or  even 
other  targets. 

There  are  typically  two  steps  to  an  automatic  target  recognition  problem: 


detection  and  identification.  Usually  some  relatively  fast  and  coarse  method  is 
used  to  detect  an  object  from  background  noise,  and  a  slower  more  precise 
method  is  used  to  identify  it.  Typical  features  that  are  required  to  be  extracted 
from  the  target  when  it  is  detected  often  include  its  position,  its  size  and  shape, 
and  its  speed. 

In  order  to  measure  these  quantities,  an  automatic  target  recognition 
system  will  possess  sensors  such  as  high  resolution  cameras  and  complex  radar 
arrays.  These  sensors  will  obtain  data  and  send  it  to  the  processing  portion  of 
the  system.  The  system  will  then  determine  first  whether  a  target  even  exists 
and  then  attempt  to  identify  the  target. 

It  is  immediately  very  clear  that  the  second  portion  of  the  problem  (the 
identification)  is  basically  a  pure  classification  problem.  Once  it  is  determined 
that  a  tank  is  found,  for  example,  it  is  important  to  be  able  to  quickly  determine 
whether  the  tank  is  friendly  or  hostile.  An  automatic  recognition  system  thus 
frequently  consists  of  several  modules,  one  of  which  is  the  classifier. 

Usually  the  classifier  is  designed  with  the  assumption  that  each  input, 
once  found,  belongs  to  only  one  of  the  classes.  This  assumption  will  become 
important  later  because  it  will  allow  us  to  make  use  of  some  well-known  math- 
ematical theorems  in  order  to  determine  when  classification  may  be  possible. 

Pattern  Recognition 

A  second  application  of  the  theory  of  signal  classification  is  in  the  field  of 
pattern  recognition.  This  is  an  extremely  broad  field,  concerning  a  wide  range 


of  problems  of  practical  interest,  including  character  recognition  and  speech 
identification. 

One  classical  application  is  the  reading  of  characters  written  either  by 
hand  or  by  machine.  This  application  has  a  wide  range  of  uses  in  government 
and  commercial  industry.  For  example,  computers  used  by  the  post  office  are 
able  to  indentify  machine-written  letters  on  envelopes  in  order  to  sort  them. 
Another  important  area  deals  with  financial  institutions.  In  these  cases,  the 
problem  typically  deals  with  classifying  an  input  character  into  one  of  the  thirty- 
six  classes  formed  by  the  characters  in  the  alphabet  and  the  ten  numerals.  The 
area  of  printing  is  usually  prescribed,  so  it  is  easy  to  locate  and  segment  the 
characters.  Some  form  of  sampling  is  usually  done,  and  then  an  algorithm 
determines  the  character. 

There  are  also  several  problems  in  the  field  of  speech  recognition  that  rely 
heavily  on  classification  theory.  These  problems  include  the  following:  speaker 
identification,  speaker  verification,  and  isolated  word  recognition  [16].  In  a 
speaker  verification  system,  the  number  of  classes  relates  to  the  number  of 
different  individuals  that  one  wishes  to  recognize.  In  isolated  word  recognition, 
the  number  of  classes  will  depend  on  the  "vocabulary"  of  the  system  and  may 
be  as  large  as  10,000. 

Many  problems  dealing  with  pattern  recognition  are  found  in  the  area 
of  medicine  as  well.  There  are  many  applications  that  result  in  continuous 
functions,  two-dimensional  gray  scale  images,  and  time-varying  images.  These 
include  results  from  electocardiograms,  electroencephalograms,  and  X-ray  im- 


ages,  to  name  a  few.  Cell  analyzers  classify  blood  cells  in  a  population  and 
determine  cell  type.  Signal  classification  routines  are  of  enormous  importance 
in  gathering  fast  information  from  these  and  other  biological  data. 

These  are  just  some  of  the  many  real-world  applications  in  which  signal 
classification  plays  a  very  important  role.  This  makes  it  necessary  to  develop 
routines  which  are  capable  of  performing  well  in  signal  processing  problems.  It 
is  in  this  light  that  we  consider  the  problem  of  determining  a  structure  suitable 
for  classification. 


2.  Neural  Networks 

It  has  long  been  recognized  that  the  human  brain  functions  in  a  com- 
pletely different  way  from  the  modern  digital  computer.  There  has  been  a  great 
interest  in  studying  how  the  human  brain  works  and  in  determining  whether  it 
is  feasible  to  design  a  model  capable  of  solving  problems  in  a  similar  manner. 
Ramon  and  Cajal  in  1911  introduced  the  concept  of  neurons  as  the  basic  ele- 
ments of  the  brain  [11].  It  has  been  determined  that  neurons  process  information 
one  hundred  thousand  to  one  million  times  slower  than  a  basic  silicon  gate  chip. 
The  brain  compensates  for  this  slower  speed  by  possessing  in  the  neighborhood 
of  10  billion  neurons  and  60  billion  synapses,  or  interconnections  between  the 
neurons  [21].  As  a  result  the  brain  is  capable  of  performing  many  tasks  at  rates 
much  greater  than  even  the  fastest  computer.  It  is  in  an  attempt  to  emulate 
this  capability  of  the  brain  that  the  field  of  neural  networks,  or  artificial  neural 
networks,  was  born. 

The  history  of  neural  networks  dates  back  to  the  1940's,  when  McCulloch 
and  Pitts  in  1943  proposed  a  computational  model  of  an  element  resembling  a 
neuron  [3].  After  some  initial  research,  the  idea  faded  until  interest  began  to 
return  in  the  1980's.  Since  then,  the  field  of  neural  networks  has  grown  rapidly, 
with  interest  from  researchers  in  a  number  of  fields  ranging  from  engineering  to 
physics  to  psychology. 

A  neural  network,  essentially,  is  a  structure  that  attempts  to  model  the 
way  the  brain  performs  some  task  and  then  to  perform  that  task  in  a  similar 
manner.  The  structure  may  be  electronically  built  or  simulated  in  software,  for 


example.  A  neural  network  will  contain  a  large  number  of  individual  cells,  which 
model  the  neurons,  and  a  number  of  interconnections  between  them,  which 
model  the  synapses.  Often  the  information  passed  through  the  interconnections 
will  be  multiplied  by  constants  in  order  to  achieve  a  certain  task.  This  is  known 
as  weighting.  Haykin  gives  a  definition  as  adapted  from  Aleksander  and  Morton 
in  1990: 

A  neural  network  is  a  massively  parallel  distributed  processor  that 
has  a  natural  propensity  for  storing  experimental  knowledge  and 
making  it  available  for  use.  It  resembles  the  brain  in  two  respects: 

1.  Knowledge  is  acquired  by  the  network  through  a  learning  pro- 
cess. 

2.  Interneuron  connection  strengths  known  as  synaptic  weights 
are  used  to  store  the  knowledge. 

The  learning  process  mentioned  here  is  often  an  attempt  to  modify  the 
interconnection  weights  in  order  to  accomplish  the  designated  task.  This  at- 
tempt compares  with  the  well-known  field  of  adaptive  filter  theory,  where  filter 
weights  are  adapted  over  time  until  they  approach  a  steady-state  value. 

There  are  many  benefits  that  arise  from  neural  networks'  inherent  struc- 
ture. The  following  are  some  of  them  (see  [11]). 

1.  Nonlinearity.     The  functions  performed  by  the  neurons  are  nonlinear; 
therefore  the  entire  network,  which  is  a  weighted  connection  of  these  neu- 


rons,  will  also  be  nonlinear.  This  helps  in  modeling  typical  applications, 
which  are  often  nonlinear. 

2.  Input-output  Mapping.  One  way  in  which  the  values  for  the  weights  used 
in  the  interconnections  of  the  neural  network  are  obtained  is  by  a  process 
called  training.  An  example  input  is  given,  and  weights  are  chosen  so 
that  the  error  between  the  actual  output  and  some  known  desired  output 
is  minimized.  This  training  procedure  is  repeated  until  the  values  of  the 
weights  reach  a  steady  state  (if  possible).  Thus  the  neural  network  learns 
by  creating  an  input-output  mapping. 

3.  Adaptivity.  A  neural  network  has  the  property  of  adapting  its  synaptic 
weights  in  order  to  match  a  change  in  the  surrounding  environment.  When 
it  is  operating  in  one  environment,  it  may  be  retrained  to  operate  in 
another  environment  which  has  only  minimal  changes.  Further,  a  neural 
network  operating  in  a  nonstationary  environment  is  able  to  adapt  its 
weights  in  real  time. 

4.  Evidential  Response.  A  neural  network,  when  faced  with  a  choice,  is  often 
able  not  only  to  select  the  right  choice,  but  to  give  a  confidence  about  the 
choice  it  made.  For  example,  a  neural  network  used  for  classification  and 
given  an  input  signal  may  output  the  class  for  that  signal  as  well  as  how 
sure  it  is  that  that  is  actually  the  correct  class. 

5.  Fault  Tolerance.  Since  each  of  the  many  neurons  in  a  neural  network 
stores  an  important  bit  of  information,  the  network's  power  is  distributed 


over  each  of  these  neurons.  This  allows  the  network  in  theory  to  continue 
operating  even  when  one  of  the  neurons  fails,  though  with  some  degrada- 
tion in  performance.  Neural  networks  are  thus  often  marked  by  a  gradual 
decay  in  performance  instead  of  a  single  catastrophic  failure. 

6.  Uniformality  of  Analysis  and  Design.  Because  all  neural  networks  are  sim- 
ilar in  a  structural  sense  and  the  same  notation  is  used  in  the  applications 
of  neural  networks  to  different  problems,  they  are  in  a  sense  universal. 
This  is  seen  in  the  following  properties: 

•  Neurons  are  common  to  all  neural  networks. 

•  This  commonality  allows  for  the  sharing  of  information  between  neu- 
ral networks  in  different  applications. 

•  It  is  possible  to  build  modular  networks  easily  simply  by  integrating 
the  different  modules.  In  other  words,  parts  of  different  networks 
(or  even  entire  networks)  may  be  used  easily  in  conjunction  with  one 
another  to  create  a  new  network. 

As  neurons  are  the  building  blocks  of  a  neural  network,  their  modeling  is 
most  important.  The  basic  design  for  a  neuron  is  fairly  simple.  A  set  of  synapses 
are  input  to  the  neuron.  These  interconnections  are  weighted  by  real  numbers, 
the  synaptic  weights.  These  weighted  values  are  then  summed.  Finally,  this  sum 
is  passed  through  a  (typically)  nonlinear  activation  function.  This  function 
usually  serves  to  limit  the  output  of  the  neuron  to  some  desired  range,  for 
example  [0,1]  or  [—1,1].    An  example  of  this  model  of  a  neuron  is  shown  in 
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Synaptic 
Weights 


Figure  1:  Nonlinear  model  of  a  neuron 

Figure  1. 

While  the  neurons  themselves  are  modeled  more  or  less  the  same  regard- 
less of  the  application,  there  are  different  architectures  for  the  actual  network. 
We  will  be  concerned  with  just  one  particular  type,  called  a  feed-forward  net- 
work with  one  hidden  layer.  This  network  architecture  consists  of  a  large  number 
of  neurons  arranged  schematically  in  three  layers.  This  may  be  seen  in  Figure 
2. 

In  theory,  each  unit  of  the  input  layer  may  be  connected  to  each  unit  of 
the  hidden  layer.  This  connection  has  a  weight,  which  as  mentioned  above  is  a 
real  number,  associated  with  it.  The  weights  are  denoted  by  Wij.  So  each  unit 
on  the  hidden  layer  receives  a  weighted  sum  of  elements  from  the  input  layer 
and  then  processes  this  sum  with  an  activation  function.  Finally,  the  result  of 
this  activation  is  transmitted  to  the  output  layer  with  another  set  of  weights 
and  then  summed.  The  result  for  the  network  structure  shown  in  Figure  2  is: 


Figure  2:  A  feed-forward  neural  network 

Finally,  it  is  important  to  note  that  it  is  not  necessarily  possible  to 
solve  any  problem  simply  by  constructing  a  neural  network  at  random  and  then 
attempting  to  train  the  weights.  It  is  important  to  determine  when  a  solution 
will  be  possible  and  what  structure  of  network  to  try.  Later  it  will  be  shown  that 
a  certain  type  of  neural  network  is  capable  of  solving  an  important  classification 
problem. 
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3.  Background 


Metric  Spaces 


A  type  of  space  that  will  play  a  particularly  important  role  in  the  study 
of  approximation  is  a  metric  space.  They  are  described  in  detail  in  many  books, 
for  example  [9],  [13],  and  [18]. 

Definition :  A  metric  space  is  a  pair  (X,  p)  where  X  is  a  set  of  elements  and 
p  is  a  metric,  or  distance  function,  that  is  nonnegative  and  real-valued  with  the 
following  properties: 

1.  p(x,  y)  =  0  if  and  only  if  x  =  y; 

2.  p{x,y)  =  p(y,x); 

3.  p{x,y)  +  p{y,z)  <  p(x,z). 

Some  examples  of  metric  spaces  are: 

Example  1:  The  set  of  real  numbers  with  metric  p(x,  y)  =  \x  —  y\,  referred  to 
as  Mot  M1. 

Example 2:  The  set  of  all  ordered  n-tuples  x  —  (xi,X2,  ■  ■  -xn),  with  metric 
p(x,  y)  =  i  /  £  (xk  —  yk)2-  This  space  is  generally  referred  to  as  Mn. 

Example  3:  The  set  of  continuous  functions  denned  on  a  closed  interval  [a,  b] 
with  metric  p(f,g)  =  max  \f(t)  —  g(t)\. 

a<t<b 
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Example  4:  This  same  set  of  continuous  functions  along  with  the  metric 

p(f,g)  =  (fb[f(t)-g(t)rdt)V* 

Ja 

form  a  different  yet  equally  valid  metric  space  (known  as  L2(Mn).  Thus,  the 
metric  as  well  as  the  set  of  points  must  be  known  in  order  for  the  space  to  be 
completely  determined. 

Let  X  be  a  metric  space  with  xq  €  X  and  let  r  >  0.  We  define  an  open 
ball  with  radius  r  centered  about  Xq  (written  b(xo,r))  to  be  the  set  of  points 
x  €  X  such  that  p(x,  Xq)  <  r.  Let  A  C  X.  We  define  a  point  x  6  A  to  be  an 
interior  point  of  the  set  A  if  b(x,  r)  C  A  for  some  r  >  0.  That  is,  we  can  find 
an  open  ball  surrounding  the  point  x  such  that  every  point  in  the  ball  belongs 
to  the  set  A.  It  is  in  this  way  that  we  go  about  defining  open  sets  in  a  metric 
space.  In  fact,  a  set  A  C  X  is  called  an  open  set  if  all  of  its  points  are  interior 
points. 

Example  1:  Consider  the  set  (0,1)  in  JR.  Given  any  point  in  the  set,  it  is 
possible  to  choose  an  open  ball  of  some  radius  such  that  the  ball  is  contained 
in  (0, 1).  Therefore,  (0, 1)  is  open  in  M. 

Example  2:  On  the  other  hand,  consider  the  set  [0, 1)  in  IR  and  look  at  any 
open  ball  about  the  point  0  with  radius  r.  Whatever  the  choice  of  r,  there  will 
be  points  contained  in  the  ball  that  are  not  in  [0, 1)  (for  example,  the  point 
— r/2);  therefore  the  point  0  is  not  an  interior  point  of  the  set  [0, 1),  Therefore 
the  set  is  not  open. 

Let  X  be  a  metric  space  and  x  €  X.  We  define  a  neighborhood  of  x  as  a 
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set  containing  an  open  set  containing  x.  This  open  set  will  necessarily  contain 
an  open  ball  b(x0,  e)  for  some  e  >  0.  Therefore,  every  neighborhood  of  a  point 
will  contain  an  open  ball  of  that  point.  Again  let  X  be  a  metric  space  and  let 
A  C  X.  A  point  ieXis  called  a  contact  point  of  A  if  every  neighborhood  of 
x  contains  at  least  one  point  in  A.  Obviously  all  x  6  A  are  contact  points  of 
A.  If  every  neighborhood  of  x  contains  infinitely  many  points  in  A,  then  x  is 
called  a  limit  point  of  A.  Note  that  a  limit  point  is  necessarily  a  contact  point 
by  definition.  The  closure  of  a  set  A,  written  as  A,  is  simply  the  set  of  all  the 
contact  points  of  A.  A  set  which  is  equivalent  to  its  closure,  (A  =  A)  is  known 
as  a  closed  set. 

Example  1:  Consider  again  the  set  [0,1)  in  M.  It  is  not  possible  to  find  an 
open  ball  about  the  point  1  that  does  not  contain  any  points  in  [0, 1).  Therefore 
every  neighborhood  of  1  contains  at  least  one  point  (in  fact,  every  neighborhood 
contains  infinitely  many  points)  in  the  set  [0, 1).  This  implies  that  1  is  a  contact 
point  (and  a  limit  point)  of  the  set  [0,1).    Since  1  ^  [0,1),  the  set  does  not 


coincide  with  its  closure  (in  fact,  as  expected,  [0, 1)  =  [0, 1])  and  is  therefore 
not  a  closed  set. 

Example  2:  On  the  other  hand,  the  set  [0, 1]  can  be  shown  to  be  closed  as  its 
closure  is  the  very  same  set  [0, 1]. 

One  of  the  most  important  concepts  concerning  metric  spaces  is  that  of 
continuity.  Let  (X,  px)  and  (Y,  py)  be  metric  spaces  and  let  /  be  a  function  such 
that  /  :  X  — >  Y .  Then  /  is  continuous  at  the  point  p  €E  X  if  for  every  e  >  0 
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there  exists  a  S  >  0  such  that  py(f(x),  f{p))  <  e  whenever  px(x,p)  <  S. 

A  sequence  {xn}  in  a  metric  space  X  is  said  to  converge  if  there  is  a  point 
p  €  X  with  the  following  property:  For  every  e  >  0  there  is  an  integer  N  such 
that  n  >  N  implies  that  p(xn,p)  <  e.  We  write  this  as  xn  — >  p  or  lim  xn  =  p. 
We  define  {xn}  to  be  a  Cauchy  sequence  in  a  metric  space  X  if  for  every  e  >  0, 
there  exists  a  positive  integer  N  such  that  \xn  —  xm\  <  e  for  n,  m  >  N.  We  can 
easily  show  that  a  sequence  converges  if  and  only  if  it  is  a  Cauchy  sequence. 
A  metric  space  is  said  to  be  complete  if  every  Cauchy  sequence  converges  to  a 
point  in  the  space.  The  completeness  of  certain  metric  spaces  is  very  important 
to  proving  results  in  those  spaces. 

In  a  similar  manner,  we  say  that  a  sequence  of  functions  {fn}  from  X  to 
]R  converges  uniformly  on  X  to  a  function  /  if  for  every  e  >  0  there  exists  an 
integer  N  such  that  n>  N  implies  |/„(x)  —  f(x)\  <  e  for  all  x.  We  often  write 
this  as  /„  — >  f  uniformly.  For  a  discussion  in  greater  depth  of  convergence,  see 
[19]. 

Topological  Spaces 

Although  metric  spaces  are  usually  the  most  general  space  needed,  there 
may  be  times  when  a  result  may  be  proved  for  a  more  general  space.  It  is  for 
this  purpose  that  we  now  introduce  the  topological  space. 

Definition :  A  topological  space  is  the  pair  (X,  r)  consisting  of  a  set  of  points 
X  and  a  topology  r,  where  r  is  a  family  of  subsets  G  C  X,  called  open  sets, 
with  the  following  properties: 

14 


1.  The  set  X  itself  and  the  empty  set  0  belong  to  r. 


2.  Arbitrary  unions  \JGa  and  finite  intersections   f\  Gk  of  open  sets  belong 

a  k=\ 

to  r. 


The  definitions  of  open  and  closed  sets  in  a  topological  space  X  is  quite 
simple.  A  set  A  C  X  is  an  open  set  if  A  belongs  to  r.  A  set  B  in  a  topological 
space  X  is  a  closed  set  if  its  complement  X  —  B  is  open. 

We  can  also  extend  the  concepts  of  a  neighborhood,  contact  point,  limit 
point,  and  closure  of  a  set  in  a  topological  space.  By  a  neighborhood  of  x,  we 
mean  any  open  set  G  containing  x.  A  point  x  €  X  is  a  contact  point  of  T  C  X 
if  every  neighborhood  of  x  contains  at  least  one  point  in  T.  A  point  x  E  X  is  a 
limit  point  ofTCX  if  every  neighborhood  of  x  contains  infinitely  many  points 
in  T.  Finally,  the  closure  of  a  subset  T  of  a  topological  space  X  is  the  set  of  all 
the  contact  points  of  T. 

Two  important  types  of  topological  spaces  are  Hausdorff  spaces  and  nor- 
mal spaces.  A  topological  space  X  is  called  a  Hausdorff  space  if: 

1.  Sets  consisting  of  single  points  are  closed. 

2.  For  every  pair  of  distinct  points  x  and  y  in  X,  there  are  disjoint  neigh- 
borhoods of  x  and  y. 

A  topological  space  is  called  a  normal  space  if: 

1.  Sets  consisting  of  single  points  are  closed. 
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2.  For  every  pair  of  disjoint  closed  sets  A  and  B,  there  are  disjoint  neighbor- 
hoods of  A  and  B. 

Obviously,  every  normal  set  is  Hausdorrf,  though  a  Hausdorff  set  need 
not  be  normal.  It  can  be  verified  that  all  metric  spaces  are  topological  spaces 
simply  by  taking  r  to  be  the  family  of  open  sets  that  are  open  in  the  metric 
space  in  the  usual  sense.  This  is  very  important  as  it  allows  any  result  relating 
to  topological  spaces  to  be  applied  to  metric  spaces  as  well.  In  fact,  we  get  an 
even  better  result:  all  metric  spaces  are  normal  (and  therefore  Hausdorff).  The 
contrasts,  however,  to  both  of  these  statements  are  not  true. 

Example  :  The  topological  space  consisting  of  only  two  points  {0, 1}  where  r 
consists  only  of  the  sets  {0, 1}  (the  entire  space)  and  0  is  not  a  metric  space. 

Continuity  in  a  topological  space  is  a  somewhat  different  concept  than 
continuity  in  a  metric  space  as  well.  Let  (Ar,  tx)  and  (Y,  ry)  be  two  topological 
spaces  and  let  /  :  X  — >  Y.  Then  /  is  continuous  if  f~l{A)  £  rx  for  every  A  in 
Ty.  In  other  words,  continuity  implies  that  the  inverse  image  of  an  open  set  is 
open. 

A  family  M.  of  subsets  Ma  of  a  topological  space  X  is  called  a  cover  of 
X  if  X  C  \J  Ma.  If  the  sets  Ma  consist  entirely  of  open  sets,  then  we  call  the 

Q 

family  an  open  cover.  A  topological  space  is  compact  if  every  open  cover  has  a 
finite  subcover. 

Although  metric  spaces  possess  many  of  the  nice  properties  that  we 
would  like  to  have  for  topological  spaces,  it  is  not  true  that  all  metric  spaces 
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are  compact.  There  are  some  theorems  (see  for  example  [14]),  however,  that 
allow  us  to  determine  whether  a  given  metric  space  is  compact  without  having 
to  view  it  as  a  topological  space. 

Let  A  and  B  be  subsets  of  the  metric  space  X.  Then  the  set  A  is  called 
an  e-net  for  the  set  B  if  there  exists  a  point  xa  6  A  such  that  for  e  >  0  any 
x  E  B:  p(x,xa)  <  e. 

Theorem  1  (Hausdorff).  For  compactness  of  a  set  M  of  a  metric  space  X  it 
is  necessary  that  there  should  exist  a  finite  e-net  of  the  set  M  for  every  e  >  0. 
If  the  space  X  is  complete,  then  the  condition  is  also  sufficient. 

Roughly  speaking,  a  set  is  compact  if  we  can  find  a  finite  number  of 
points  and  take  open  balls  centered  at  those  points  such  that  the  union  of  all 
the  open  balls  contains  the  set.  There  are  some  improvements  to  this  if  we 
consider  certain  specific  spaces. 

Example  1:  (Heine-Borel).  A  subset  of  IR  is  compact  if  and  only  if  it  is  closed 
and  bounded. 

Example  2:  (Arzela).  The  functions  of  a  set  A  are  said  to  be  uniformly 
bounded  if  there  exists  a  constant  K  such  that  \x{t)\  <  K  for  all  x{t)  E  A. 
The  same  functions  are  equicontinuous  if  given  e  >  0,  there  exists  a  6  >  0  such 
that  \x(ti)  —  x(t2)\  <  e  whenever  \t\  —  t2\  <  6.  A  set  A  C  C[0, 1],  the  space 
of  real- valued  continuous  functions  on  the  closed  interval  [0,1],  is  compact  if 
and  only  if  A  is  closed  and  the  functions  x  E  A  are  uniformly  bounded  and 
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equicontinuous. 

Linear  Spaces 

We  now  introduce  the  concept  of  a  linear  space. 

Definition :  A  nonempty  set  L  is  called  a  linear  space  if  it  satisfies  the  following 
axioms: 

1.  Any  two  elements  x  G   L,  y  G  L  uniquely  determine  a  third  element 
x  +  y  G  L  called  the  sum  of  a:  and  y  that  satisfies  the  following  properties: 

(a)  x  +  y  =  y  +  x  (commutativity); 

(b)  (x  +  y)  +  z  =  x  +  (y  +  z)  (associativity); 

(c)  L  contains  an  element  0,  called  the  zero  element  such  that  for  all 
x  €  L,  x  +  0  =  x; 

(d)  For  each  x  G  L,  there  exists  an  element  —  x  G  L  such  that  x  +  (— x)  = 
0,  where  0  is  the  zero  element; 

2.  There  exists  a  product  operation  such  that  any  element  x  G  L  and  any 
number  a  determine  a  unique  element  ax  G  L  such  that: 

(a)  a{(5x)  =  (a/3)x 

(b)  Ix  G  L\ 

3.  The  operations  of  addition  and  multiplication  obey  the  following  distribu- 
tive axioms: 
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(a)  (a  +  (3)x  =  ax  +  fix; 

(b)  a(x  +  y)  =  ax  +  en/. 

The  elements  x,  y,  ...  of  a  linear  space  are  often  called  vectors,  and  the 
entire  space  is  often  called  a  vector  space.  The  numbers  a,  /3,  . . .  are  referred 
to  as  scalars  and  the  entire  set  of  allowable  scalars  is  referred  to  as  the  field. 
Typically,  the  field  is  the  set  of  real  numbers,  in  which  case  the  space  is  referred 
to  as  a  real  linear  space.  A  subset  L0  of  a  linear  space  L  is  referred  to  as  a  linear 
subspace  of  L  if  L0  itself  is  a  linear  space  over  the  same  field  as  L. 

It  is  possible  that  a  linear  space  possess  no  topology  whatsoever  as  long 
as  it  satisfies  the  three  properties  above.  However,  in  many  applications  the 
concepts  of  a  linear  space  and  topological  space  are  combined.  A  space  that 
is  both  a  linear  space  and  a  topological  space  is  referred  to  either  as  a  linear 
topological  space  or  a  topological  vector  space.  We  require  additionally  only 
that  the  vector  operations  of  addition  and  multiplication  (which  are  not  always 
the  usual  addition  and  multiplication)  be  continuous  in  the  topology  r.  It  is 
possible  too  to  apply  the  concept  of  a  metric  to  a  linear  space,  but  what  is  more 
useful  is  to  define  an  operation  a  bit  more  specific  than  a  metric,  called  a  norm, 
and  apply  it  to  a  linear  space. 

Normed  Linear  Spaces 

Definition :  A  linear  space  L  equipped  with  an  operation  called  a  norm  (||  •  ||) 
is  called  a  normed  linear  space  if  ||  •  ||  satisfies  the  following  three  properties: 
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1.  \\x\\  >  0  for  all  x  where  ||x||  =  0  if  and  only  if  x  =  0; 

2.  ||o;x||  =  \a\  \\x\\  for  all  x  G  L  and  all  a; 

3-   \\x  +  y\\  <  \\x\\  +  \\y\\  for  all  x  and  y  in  L. 

Just  as  every  metric  space  is  also  a  topological  space,  every  normed  linear 
space  may  also  be  considered  a  metric  space  (and  therefore  a  topological  space 
as  well)  by  taking  the  metric  to  be: 

p{x,y)  =  \\x-y\\. 

Again,  the  converse  is  not  true. 

Example :  The  metric  space  consisting  of  the  closed  interval  [0, 1]  with  the 
"discrete  metric"  p(x,  y)  —  1  if  x  ^  y  and  p(x,  x)  =  0  cannot  be  made  into  a 
normed  linear  space. 

A  normed  linear  space  that  is  complete  (in  the  same  sense  that  a  metric 
space  is  complete)  is  known  as  a  Banach  space. 

One  special  Banach  space  is  called  a  Hilbert  space. 

Definition :  A  Hilbert  space  is  a  Banach  space  with  the  norm  ||j;||  =<  x,  x  >1//2 
where  <  •,  •  >  is  an  inner  product  with  the  following  properties  (assuming  the 
space  is  real): 

1.  <x,y>    =    <  y,x  > 

2.  <  oiiX\  +  0:2X2,  y  >    =  oti  <  xi,y  >  +a2  <  x2,  y  > 
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3.  <  x,  x  >    >   0  for  all  x  ^  0. 

The  most  common  example  of  a  Hilbert  space  is  the  n-dimensional  space 


Mn,  with  the  Euclidean  norm  \\x\\  =  y^Li  xl  where  x  =  (xi,  x2:  ■  ■  ■ ,  xn). 

The  Hahn-Banach  Theorem  and  Separation  in  Linear  Spaces 

One  of  the  most  important  and  fundamental  results  in  all  real  analysis 
is  the  Hahn-Banach  theorem.  There  are  many  different  forms  of  the  theorem 
and  in  most  cases  any  version  of  the  theorem  can  be  used  to  directly  prove 
any  other  version.  It  is  first  necessary  to  introduce  the  idea  of  convex  sets  and 
convex  functional. 

Definition :  A  set  M  C  L  is  called  a  convex  set  if  for  each  pair  of  points  x, 
y  e  M,  all  points  on  the  line  segment  joining  x  and  y  (that  is,  all  points  of  the 
form  kx  +  (1  4-  k)y,  0  <  k  <  1)  are  also  elements  of  M. 

Definition  :  A  functional  p  defined  on  a  real  linear  space  L  is  said  to  be  convex 
if  it  has  the  following  properties: 

1.  p(ax)  =  ap(x)  for  all  x  €  L  and  all  o;  >  0; 

2.  p(x  +  y)  >  p(x)  +p{y)  for  all  x,  y  e  L. 

We  now  turn  to  the  idea  of  extending  a  linear  functional.  Suppose  we 
have  a  linear  functional  defined  on  a  certain  subspace.  We  want  to  know  whether 
there  exists  a  linear  functional  on  the  entire  space  that  is  equal  to  our  first 
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functional  on  the  subspace.  The  Hahn-Banach  theorem  tells  us  when  this  is 
possible. 

Theorem  2  (Hahn-Banach)  Let  p  be  a  finite  convex  functional  defined  on  a 
real  linear  space  L  and  let  Lq  be  any  subspace  of  L.  Let  /o  be  any  linear 
functional  on  L0  satisfying  the  condition 

f0(x)  <  p{x) 

on  L0.  Then  there  exists  a  linear  functional  /  on  L,  called  the  extension  of  /0 
such  that  /  =  /o  at  every  point  of  L0  and  f(x)  <  p(x)  on  L. 

Proof:  We  can  assume  that  L0  ^  L.  Let  z  be  any  element  of  L  —  L0,  and  let 
L  be  the  subspace  generated  by  L0  and  the  element  z,  this  being  the  set  of  all 
linear  combinations  of  the  form  x  +  tz  (x  G  L0,  t  €  IR).  For  /  to  be  an  extension 
of  /o  onto  L,  we  need 

f(x  +  tz)  =  f(x)  +  f{tz)  =  fQ(x)  +  tf(z) 

Now,  let  c  =  f(z)  and  note  that  if  /  is  an  extension  onto  L  then  fo(x)  +tc< 
p(x  +  tz).  This  condition  can  easily  translate  to  the  two  conditions: 

c  <  p(x/t  +  z)  -  f0{x/t)  if  t  >  0  and  c  >  -p(-x/t  -  z)  -  fQ(x/t)  if  t  <  0 

So  what  remains  is  to  show  that  there  is  always  a  c  satisfying  these  conditions. 
In  this  light,  let  y\  and  ?/2  be  elements  of  L0.  Then 

/o(2/2  -  2/i)    =    /oM  -  k{y\)  <  p{V2  -  Vi) 

=    p{{y-2  +  z)-  (yi  +  z))  <  p{y2  +  z)+  p(-yi  -  z). 
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So  we  get 

-foM+pfa  +  z)  >  -fo(yi)-p(-yi  -z). 

Now  let  d  =  sup[-/0(j/i)  -  p(-yi  -  z)]  and  c2  =  inf  y2[-/ofe)  +  pfe  +  z)]- 
Then  c2  >  C]  and  it  simply  remains  to  choose  c2  >  c  >  C\  and  note  that  c 
satisfies  the  necessary  conditions.  So  the  functional  ft  defined  on  Lt  satisfies 
the  condition  f(x)  <  p(x)  for  x  €  L.  An  induction  argument  not  given  here 
proves  the  case  when  L  is  the  entire  space  L. 

By  applying  the  Hahn-Banach  theorem,  we  may  show  a  somewhat  more 
useful  result,  given  in  [2]. 

Theorem  3  Let  /  be  a  bounded  linear  functional  defined  on  the  subspace  L  of 
the  real  normed  linear  space  X.  Then,  there  exists  a  bounded  linear  functional 
F  defined  on  the  entire  space  X  so  that  F(x)  =  f(x)  for  x  6  L  and  ||F||  = 


Proof:  Since  /  is  a  bounded  linear  functional,  then  for  x  G  L,  \f(x)\  <  ||/||||aj||- 
For  x  £  X  define  p(x)  =  ||/||||a;||.  It  is  then  easy  to  show  that  p  is  convex  and 
that  f(x)  <  p{x).  By  the  Hahn-Banach  Theorem,  extend  /  to  a  new  functional 
F  defined  on  all  of  X  such  that  F{x)  <  p(x)  =  \\f\\\\x\\  and  F(x)  =  f(x)  for 
x  £  L.  Clearly,  F  is  bounded  and  ||F||  <  ||/||.  Similarly,  if  x  €  L,  then  \f(x)\  = 
1-^(^)1  ^  ll^llll^ll)  implying  ||/||  <  \\F\\.  Combining  the  two  inequalities,  we  see 
that  ||F||  =  ll/H  and  the  proof  is  complete. 


lrrhe  norm  operator  ||  •  ||,  when  applied  to  a  bounded  linear  functional  on  a  normed  linear 
space  X  (as  is  the  case  here)  is  defined  as  ||/||  =    sup  |/(x)|.    Further,  ||/||  can  easily  be 

shown  to  have  the  following  properties:  ||/||  =  sup  KfcrH- ,  and  |/(x)|  <  ||/||||x||  for  all  x  €  X. 
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We  now  turn  to  perhaps  the  most  useful  corollary  of  the  Hahn-Banach 
theorem.  It  is  very  desirable  in  many  situations  to  know  that  there  are  a  suf- 
ficient number  of  bounded  linear  functionals  defined  on  a  space  to  strictly  sep- 
arate the  elements  of  that  space.  By  strictly  separate,  we  mean  that  for  any 
two  elements  X\  and  x2  of  a  linear  space  X,  there  exists  an  /  G  X*,  the  set  of 
bounded  linear  functionals  on  X,  such  that  f(xi)  -  f{x2)  ^  0.  We  prove  this 
in  the  context  of  the  following  theorem. 

Theorem  4  Let  X  be  a  normed  linear  space  and  x0  6  X,  x0  ^  0.  Then  there 
exists  an  F  €  X*  such  that  ||F||  =  1  and  F(x0)  =  \\x0\\. 

Proof:  Let  L  be  the  linear  subspace  of  X  generated  by  taking  the  linear  span 
of  x0.  All  elements  in  L  will  thus  have  a  representation  axo,  a  6  JR.  Define 
the  function  /  on  L  by  f(ax0)  =  a\\x0\\.  It  is  seen  at  once  that  f(x0)  =  \\x0\\ 
simply  by  taking  a  =  1.  We  can  then  extend  /  to  a  bounded  linear  functional 
F  defined  on  the  whole  space  X  as  noted  in  the  previous  theorem.  Since  F=f 
on  L,  F(x0)  =  f{xo)  =  ||:co||.  It  thus  remains  only  to  show  that  ||F||  =  1.  For 
any  x  G  L,  we  see  that 

l/MI  =  \f(axo)\  =  MM  =  ||qhbo||  =  \\x\l 

implying  that  ||/||  =  1  and  therefore  j|F||  =  1  by  the  previous  theorem. 

To  prove  our  assertion  about  the  strict  separation  of  elements  in  a  linear 
space  by  the  functionals  defined  on  that  space,  let  X  be  a  normed  linear  space 
and  Xi  and  x2  be  distinct  elements  in  X.    Further,  let  /  6  X*.    Now  define 
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Xo  =  x\  —  X2  and  see  that  xq  ^  0  since  X\  and  £2  are  distinct.    We  may  now 
apply  the  previous  theorem  to  get 

/(*1  -  X2)  =  /(Zo)  =   INI   7^  0. 
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4.  The  Stone- Weierstrass  Theorem  and  Uniform  Approximation 

In  many  applications,  it  is  desirable  to  know  whether  a  certain  class  of 
iR-valued  functions  may  be  useful  in  uniformly  approximating  a  larger  group  of 
iR-valued  functions.  Weierstrass  proved  that  it  is  possible  to  uniformly  approx- 
imate any  continuous  functional  on  a  compact  subset  of  lRn  by  a  polynomial  in 
n  variables.  Since  that  time,  there  have  been  several  different  proofs  of  Weier- 
starass'  theorem.  One  of  the  most  useful  is  the  one  given  by  M.  H.  Stone  in 
[23].  His  primary  result,  which  will  be  shown,  generalizes  Weierstrass'  result  in 
that  it  allows  the  domain  to  be  any  compact  set  (instead  of  just  any  compact 
subset  of  Mn)  and  the  set  of  approximating  functions  to  be  a  set  other  than 
polynomials  (which  may  not  have  meaning  on  a  general  compact  set). 

In  order  to  generalize  the  theorem,  we  can  view  the  polynomials  as  a 
subset  of  the  set  from  which  we  obtain  the  approximating  functional.  We  seek  to 
know  what  functions  may  be  derived  from  a  certain  set  of  prescribed  functions  by 
the  specified  algebraic  operations  of  addition,  multiplication,  multiplication  by 
real  numbers  and  uniform  passage  to  the  limit.  The  set  of  prescribed  functions 
for  the  polynomials,  for  example,  consists  of  just  two  functions:  f\  (x)  =  1 
and  ftix)  =  x  defined  on  a  bounded  closed  interval  X  of  JR.  From  these  two 
functions  and  the  algebraic  operations  alone,  the  set  of  all  polynomials  may 
be  formed.  Weierstrass'  theorem  then  tells  us  that  the  uniform  passage  to  the 
limit  of  this  set  (the  polynomials)  is  the  set  of  all  continuous  functionals  on  X. 
Equivalently,  the  set  of  continuous  functionals  is  the  uniform  closure  of  the  set  of 
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polynomials,  or  the  continuous  functions  on  X  may  be  uniformly  approximated 
by  the  set  of  polynomials. 

In  order  to  begin  proving  this  generalized  theorem,  it  is  instructive  to 
consider  the  case  of  a  general  topological  space  X  where  the  specified  algebraic 
operations  are  the  lattice  operations  V  and  A  defined  to  be: 

/  V  g  =  max(/,  g)         and         /  A  g  =  min(/,  g) 

These  form  the  functions  h  and  k  defined  as: 

h(x)  =  max(f(x),g(x))        and        k(x)  =  mm(f(x),g(x)) 

for  any  x  6  X.  Let  C  be  the  set  of  all  continuous  real  functions  on  X  and 
Co  be  a  prescribed  subfamily  of  C.  We  want  to  obtain  the  family  U(Cq)  of  all 
functions  which  can  be  formed  from  the  functions  in  Co  by  the  application  of 
the  specified  algebraic  operations  and  uniform  passage  to  the  limit.  In  the  case 
of  the  lattice  operations,  it  is  easily  observed  that  U(C0)  is  a  part  of  C  closed 
under  uniform  passage  to  the  limit,  that  is 

C/(C0)  C  C,         U(U(C0))  =  t/(Co). 

The  first  property  may  be  shown  by  observing  that  the  mappings 

x — >  max.(f(x),g(x))         and         x — >  min(f(x),g(x)) 

are  continuous.  This  follows  from  the  continuity  of  /  and  g  (necessarily  true 
since  Co  is  a  subfamily  of  C)  and  the  continuity  of  the  max  and  min  mappings. 
Now  since  the  uniform  limit  of  continuous  functions  is  also  a  continuous  function, 
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clearly  U{C0)  C  C.  To  show  that  U{U{C0))  =  U{C0),  we  can  form  U(C0)  in 
two  steps.  First,  let  U\(Cq)  be  the  set  containing  all  the  functions  obtained  by 
applying  the  lattice  operations  alone  to  the  functions  in  C0.  Then  let  U2{Co) 
be  the  set  consisting  of  the  functions  obtained  from  those  in  Ui(C0)  by  uniform 
passage  to  the  limit.  Clearly, 

Co  C  0i(Co)  c  U2(C0)  C  U(C0). 

It  remains  to  show  that  U^Cq)  is  closed  under  the  allowable  operations,  and 
therefore  ^(Co)  =  U(Cq).  Let  /  be  a  function  which  is  a  uniform  limit  of 
functions  fn  in  ^(Co).  Then  /  must  also  be  in  ^(Co)  since  given  e  >  0, 
there  exists  a  function  gn  in  Ui(C0)  such  that  \fn  —  gn\  <  e/2  since  t/2(C0)  is, 
by  definition,  the  functions  obtained  by  passing  those  in  Ui(C0)  to  a  uniform 
limit.  Also,  \f  —  fn\  <  e/2  since  our  definition  of  /  was  a  uniform  limit  of  fn. 
Therefore,  \f  —  gn\  <  e  and  /  is  a  uniform  limit  of  functions  gn  in  U\{Cq)  and 
therefore  a  member  of  ^(Co)  We  must  now  show  that  whenever  /  and  g  are  in 
U2{C0),  then  so  are  /  V  g  and  /  A  g.  This  can  be  done  by  observing  that  if  / 
and  g  are  uniform  limits  of  functions  fn  and  gn  in  Ui(C0),  then  /  V  g  and  /  A  g 
are  uniform  limits  of  fn  V  gn  and  fn  A  gn,  respectively. 

Theorem  5  Let  X  be  a  compact  space,  C  the  family  of  all  continuous  real 
functions  on  X,  Cq  an  arbitrary  subfamily  of  C,  and  U{Cq)  the  family  of  all 
functions  (necessarily  continuous)  generated  from  Cq  by  the  lattice  operations 
and  uniform  passage  to  the  limit.  Then  a  necessary  and  sufficient  condition  for 
a  function  /  in  C  to  be  in  U(C0)  is  that,  whatever  the  points  x,  y  6  X  and 
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whatever  the  positive  number  e,  there  exists  a  function  fxy  obtained  by  applying 
the  lattice  operations  alone  to  C0  and  such  that 

\f(x)  -  fxy{x)\  <  e  and  \f{y)  -  fxy(y)\  <  e. 

Proof:  The  necessity  is  obvious.  A  proof  of  the  sufficiency,  which  is  not  com- 
plicated, is  given  in  [23].  There,  Stone  also  notes  the  following  corollary  to  the 
theorem. 

Corollary  1:  If  C0  has  the  property  that,  whatever  the  points  x,  y  6  X,  x  ^  y 
and  whatever  the  real  numbers  a  and  /?,  there  exists  a  function  /0  in  C0  for 
which  /o(x)  =  a  and  fo{y)  =  (3,  then  U(C0)  =  C. 

This  tells  us  that  the  way  in  which  a  function  /  acts  on  pairs  of  points  in 
X  determines  whether  it  can  be  approximated  U(C0)-  This  observation  leads 
to  the  following  theorem. 

Theorem  6  Let  X  be  a  compact  space,  C  the  family  of  all  continuous  (neces- 
sarily bounded)  real  functions  on  X,  C0  an  arbitrary  subfamily  of  C  and  U(C0) 
the  family  of  all  functions  (necessarily  continuous)  generated  from  Co  by  the 
linear  lattice  operations  and  uniform  passage  to  the  limit.  Then  a  necessary 
and  sufficient  condition  for  a  function  /  in  C  to  be  in  U(C0)  is  that  /  satisfy 
every  linear  relation  of  the  form  ag(x)  =  (3g{y)  ,  a(3  >  0,  which  is  satisfied  by 
all  functions  in  Cq.  The  linear  relations  associated  with  an  arbitrary  pair  of 
points  x,  y  in  X  must  be  equivalent  to  one  of  the  following  distinct  types: 

1.  g(x)  =  0  and  g(y)  =  0; 
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2.  g(x)  =  0  and  g{y)  unrestricted,  or  vice  versa; 

3.  g(x)  =  g(y)  without  restriction  on  the  common  value; 

4.  g{x)  =  Xg{y)  or  g(y)  =  \g(x)  for  a  unique  value  A,  0  <  A  <  1. 

Corollary  1:  In  order  that  U{Cq)  contain  a  nonvanishing  constant  function,  it 
is  necessary  and  sufficient  that  the  only  linear  relations  of  the  form  ag{x)  — 
Pg(y),  a/3  >  0,  satisfied  by  every  function  on  C0  be  those  reducible  to  the  form 
9(x)  =  g(y). 

Proof:  It  is  obvious  that  when  U(C0)  contains  a  nonvanishing  constant  func- 
tion then  conditions  (1),  (2),  and  (4)  can  never  be  satisfied,  so  only  (3)  must  be 
considered. 

Corollary  2:  In  order  that  U(C0)  =  C,  it  is  sufficient  that  the  functions  in  X0 
satisfy  no  linear  relation  of  the  form  (l)-(4)  of  Theorem  1. 

This  is  an  important  corollary  because  in  practice  it  is  easy  to  consider 
a  set  of  functions  with  the  property  that  all  functions  do  not  satisfy  all  of  the 
relations  (l)-(4). 

Definition :  A  family  of  arbitrary  functions  on  a  domain  X  is  said  to  be  a 
separating  family  (for  that  domain)  if,  whenever  X  and  y  are  distinct  points 
of  X,  there  is  some  function  /  in  the  family  with  distinct  values  f(x),  f(y)  at 
these  points. 

Corollary  3:  If  X  is  compact  and  if  C0  is  a  separating  family  for  X  and  contains 
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a  nonvanishing  constant  function,  then  U(Cq)  =  C. 

Proof:  Since  C0  contains  a  nonvanishing  constant  function,  it  may  satisfy  only 
condition  (3)  of  Theorem  2.  However,  since  C0  is  a  separating  family,  there  is 
a  function  /  6  C0  such  that  f(x)  ^  f{y)  for  x,  y  in  X.  So  condition  (3)  is  not 
satisfied  by  all  functions  in  Co.  Therefore  none  of  the  conditions  are  satisfied 
by  C0  and  therefore  U(Cq)  =  C. 

We  now  consider  the  case  where  U(C0)  is  built  from  the  functions  in 
C0  C  C  using  the  operations  of  addition,  multiplication,  multiplication  by  real 
numbers  (the  linear  ring  operations),  and  uniform  passage  to  the  limit.  If/  and 
g  are  uniform  limits  of  the  sequences  fn  and  gn  respectively,  the  product  fg  is 
not  in  general  the  uniform  limit  of  the  sequence  fngn.  We  therefore  require  that 
the  set  C  consist  of  the  bounded  continuous  functions  on  A".  Of  course,  this  is 
satisfied  automatically  when  X  is  compact.  This  leads  to  the  general  theorem. 

Theorem  7  Let  X  be  a  compact  space,  C  the  family  of  all  continuous  (nec- 
essarily bounded)  functions  on  X,  C0  an  arbitrary  subfamily  of  C  and  U(C0) 
the  family  of  all  functions  generated  from  Co  by  the  linear  ring  operations  and 
uniform  passage  to  the  limit.  Then  a  necessary  and  sufficient  condition  for  a 
function  /  in  C  to  be  in  U(C0)  is  that  /  satisfy  every  linear  operation  of  the 
form  g(x)  =0  or  g(x)  =  g(y)  which  is  satisfied  by  all  functions  in  X0. 

Proof:  As  a  lemma,  one  can  show  (see  [23])  that  if  /  is  in  U(C0)  then  so  is 
|/|.   This  means  that  /  is  the  uniform  limit  of  functions  in  C0  subject  to  the 
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linear  ring  operations.  Using  a  well  known  representation  of  the  min  and  max 
functions: 

max(a,  b)  =  -(a  +  b  +  \a  —  b\) 

rain(a,  b)  =  -(a  +  b  —  \a  —  b\) 

we  can  now  see  that  whenever  /  and  g  are  in  U(C0)  then  /  V  g  and  /  A  g 
are  in  U(Co)  as  well.  So  U(Cq)  is  closed  under  the  linear  lattice  operations  as 
well  as  the  linear  ring  operations  and  uniform  passage  to  the  limit.  Therefore 
the  results  in  Theorem  2  are  applicable  here.  It  remains  to  show  that  every 
function  in  U(X0)  cannot  satisfy  linear  relations  of  the  form  given  in  condition 
(4)  of  Theorem  2.  Assume  g(x)  =  Xg{y)  for  every  function  g  in  U{Cq)  and  every 
x,  y  in  X,  for  0  <  A  <  1.  Then  for  every  /  in  £/(C0),  f2  is  also  in  U(Cq)  and  the 
relations  f2(x)  =  Xf2(y)  and  Xf2(y)  =  X2f2{y)  would  hold,  implying  that  either 
f(y)  =  0  for  every  /  in  U(C0)  or  A  =  0, 1,  the  second  being  a  contradiction  to 
the  assumption.  So  we  conclude  that  /  is  in  U(C0)  if  and  only  if  it  satisfies  all 
relations  of  the  form  g(x)  =  0  or  g(x)  =  g(y)  satisfied  by  those  functions  in  Co. 

We  give  a  definition  in  order  to  restate  the  general  theorem. 

Definition :  A  family  A  of  real  functions  defined  on  a  set  X  is  said  to  be  an 
algebra  if  (i)  /  +  g  €  A,  (ii)  fg  G  A,  and  (iii)  cf  E  A  for  all  /  G  A,  g  £  A  and 
for  all  real  constants  c,  that  is,  if  ^4.  is  closed  under  addition,  multiplication,  and 
multiplication  by  real  numbers. 

An  equivalent  form  of  the  general  theorem  that  is  often  used  in  practice 
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is  stated  in  [19]  as  follows: 

Theorem  8  Let  A  be  an  algebra  of  real  continuous  functions  on  a  compact  set 
K.  If  A  separates  points  of  K  and  if  ^4  does  not  vanish  at  any  point  in  K,  then 
any  real  continuous  function  on  K  may  be  approximated  by  an  element  of  A. 

An  argument  in  [4]  extends  the  theorem  to  certain  normed  linear  spaces  that 
are  not  necessarily  compact. 

Theorem  9  Let  X  be  a  normed  linear  space  (or,  indeed,  any  Hausdorff  topo- 
logical space).  If  A  is  a  subalgebra  of  C(X),  the  continuous  functions  on  X, 
that  contains  constants  and  separates  the  points  of  X,  then  A  is  dense  in  C(X). 

Proof:  Let  /  be  any  element  of  C{X).  We  must  prove  that  each  neighborhood 
of  /  contains  an  element  of  A.  Let  A'bea  compact  set  in  X  and  e  a  positive 
number.  By  restricting  /  and  all  members  of  A  to  the  compact  set  K:  we 
can  apply  the  classical  version  of  the  Stone- Weierstrass  Theorem  in  C(K).  Its 
conclusion  is  that  the  set 

{g\K  :geA} 

is  dense  in  C(K).  Hence  there  is  an  element  g  in  A  such  that  |j/  —  g\\x  <  e. 

Now  we  give  some  examples  from  Stone's  original  article. 

Theorem  10  Let  X  be  an  arbitrary  bounded  closed  subset  of  n-dimensional 
Cartesian  space,  the  coordinates  of  a  general  point  being  Xi,. . .  ,xn.  Any  con- 
tinuous real  function  /  defined  on  X  can  be  uniformly  approximated  by  polyno- 
mials in  the  variables  x1; . . .  ,xn.  In  case  the  origin  x  =  (0, . . . ,  0),  the  function 
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/  can  be  uniformly  approximated  by  polynomials  vanishing  at  the  origin  if  and 
only  if  /  itself  vanishes  at  the  origin.  Otherwise  /  can  be  uniformly  approxi- 
mated by  such  polynomials  without  qualification. 

This  is  the  classical  approximation  theorem  proved  by  Weierstrass. 

Theorem  11  Let  /  be  an  arbitrary  continuous  real  function  of  the  real  variable 
0,  0  <  9  <  27r,  subject  to  the  periodicity  condition  /(0)  =  /(27r).  Then  / 
can  be  uniformly  approximated  on  its  domain  of  definition  by  trigonometric 
polynomials  of  the  form 


a0 


N 


p(9)  =  —  +  ^2  (an  cos  n®  +  bn  sin  n9) , 


2 


n=l 


Theorem  12  Any  continuous  real  function  /,  which  is  defined  on  the  interval 
0  <  x  <  oo  and  vanishes  at  infinity  in  the  sense  that  lim  f(x)  =  0,  can  be 
approximated  by  functions  of  the  form  e~axp(x)  where  p(x)  is  a  polynomial. 

Theorem  13  Any  continuous  real  function  /  which  is  defined  on  the  interval 
— oo  <  x  <  +oo  and  which  vanishes  at  infinity  in  the  sense  that 

lim   fix)  =    lim   fix)  =  0 

can  be  uniformly  approximated  by  functions  of  the  form  e~a  x  p(x)  where  p(x) 
is  a  polynomial. 

Several  of  these  examples  will  prove  useful  shortly. 
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5.  Neural  Network  Approximation  of  Continuous  Maps 

We  now  will  examine  a  structure  that  has  been  proven  in  useful  for  ap- 
proximation. The  structure  will  be  based  almost  entirely  on  a  proof  in  [20].  We 
assume  that  we  have  a  normed  linear  space  X  and  a  subset  C  that  is  nonempty 
and  compact.  We  let  X*  represent  the  set  of  bounded  linear  functionals  on  X 
and  Y  represent  a  set  of  continuous  maps  which  are  dense  in  X*  on  C  in  the 
usual  sense.  That  is,  for  each  <j>  e  X*  and  for  some  e  >  0,  there  exists  a  y  €  Y 
such  that  \(f){x)  —  y{x)\  <  e  for  x  €  C.  Further,  for  k  =  1, 2, 3, ...  we  let  Dk  be 
any  family  of  continuous  maps  h  :  Mk  h*  1R  such  that  given  a  compact  E  C  Mk 
and  any  continuous  g  :  E  i->  JR  as  well  as  a  >  0  there  exists  an  h  G  Dk  such  that 
\g(x)  —  h(x)\  <  a  for  x  £  E.  Let  U  be  any  set  of  continuous  maps  U  :  M\->  M 
such  that  given  a  >  0  and  any  bounded  interval  (fa,  fa)  C  IR  there  exists  a 
finite  number  of  elements  U\, . . .  ,Ui  of  U  for  which  |  exp(fi)  —  £  •  Uj(fa)\  <  o  for 
0  €  (fa,  fa)- 

Theorem  14  (Sandberg)  Let  /  :  C  *-t  M.  Then  the  following  conditions  are 
equivalent. 

(i)  f  is  continuous. 

(ii)  Given  e  >  0  there  are  a  positive  integer  k,  real  numbers  ci, . . .  ,ck,  elements 
u\, . . . ,  Uk  of  U,  and  elements  y\, . . . ,  yk  of  Y  such  that 

\f(x)-J2cJuAyj(x)}\  < € 

3 

for  x  eC. 
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(ro)  Given  e  >  0  there  are  a  positive  integer  k,  elements  y\, . . . ,  yk  of  Y,  and 
an  h  €  Dk  such  that 

\f{x)  -h[yi(x),...,yk(x)]\  <  e 

for  x  e  C. 

Proof:  First,  assume  condition  (i)  holds.  Let  V  be  the  set  of  all  functions 
v  :  C  h->  ]R  such  that 

v(x)  =  ^exp(^(x)), 

in  which  the  sum  is  finite  and  a,  6  M  and  (j)j  6  Ar*.  To  see  that  V  constitutes 
an  algebra  as  defined  above,  observe  that 

exp(^(x))  exp(i/>(x))  =  exp(</>(:c)  +  ift{x))  =  exp(0  +  il))(x). 

Taking  0  =  0  we  can  see  that  V  contains  constants.  Finally,  we  have  demon- 
strated previously  that  the  Hahn-Banach  theorem  guarantees  that  we  can  choose 
an  x  and  y  in  C  such  that  4>{x  —  y)  ^  0.  Therefore,  exp(0(x))  ^  exp((f>(y)) ,  so 
V  separates  the  points  of  C.  We  may  now  apply  the  Stone- Weierstrass  theorem 
guaranteeing  uniform  approximation  on  compacta.  In  other  words,  for  e  >  0, 
there  are  a  positive  integer  n,  real  numbers  d\, . . . , dn,  and  elements  Zi,...,zn 
of  X*  such  that 

n 

\f(x)  -  Y,  di  exp(^(x))|  <  e/3 

3=1 

for  x  e  C. 

Assume  that  £.,•  \dj\  ^  0.  Choose  7  >  0  such  that  7£j  \dj\  <  e/3.  Let 
[a  ,6]  be  an  interval  in  M  that  contains  all  of  the  sets  Zj(C),  and  let  a  €  M 
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and  b  E  M  such  that  a  <  a  and  b  >  b' .  That  is,  the  interval  [a,  6]  contains  the 
interval  [a, b'].  Now,  choose  v  >  0  such  that  |exp(/?i)  —  exp(/32)|  <  7  for  /?i, 
/?2  €  [a,  6]  with  |/?i  —  fel  <  ^.  Clearly  this  is  possible  because  of  the  continuity 
of  the  exponential  function.  Set  /9  =  min(^,  a  —a,b  —  b)  and  choose  yj  £  Y  such 
that  \zj(x)—yj(x)\  <  p,  x  G  C  for  all  j.  This  gives  |  exp(zj(:r))  —  exp(zj(x))|  <  7, 
a;  €  C  for  each  j.  Now  using  a  version  of  the  triangle  inequality,  this  gives: 

\f(x)-J2exp(yj(x))\    <     |/(x)-X;«p(^W)l 
3  j 

+    I  Hexp(zj(x))  ~  J2exP(Vj(x))\ 

3  j 

<  e/3  +  E  ldill  exP(^(x))  -  exp(3/j(z))l 

<  2e/3, 

for  a:  €  C. 

Now  we  choose  u\, . . .  ,itj  G  C/  so  that 

|eip(fl-5>G8)|<  71,0  6  m] 

i 

where  7x  ^  |dj|  <  e/3.  Then, 

l/W-EE^teWll  <  \f(x)-Y,dJexPlyj(x)}\  +  \JldJeMyj(x)} 

3     «  i  j 

"EEWwWll  ^  (2e)/3  +  EKexp[^(x)]  -djE^fcWll 

3      i  j  i 

<  (2c)/3  +  E  |di||  exp[yi(x)]  -  E^feWl!  <  (2e)/3  +  7i  E  K'l  <  *■ 

Now,  since  EjEidjWife/jM]  is  equivalent  to  T,j  CjUj[yj(x)]:  with  the  c7, 
Uj,  and  yj  in  iR,  17,  and  Y,  respectively,  we  have  shown  that  (i)  ->  (u). 
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Figure  3:  A  general  structure  for  approximation 

To  show  that  (it)  — >  (in),  let  e  >  0  and  suppose  that  there  exist  k, 
Ci, . . .  j  Cfc,  and  i*i, . . . ,  w*  such  that 

1/0*0  "  E^ifeiWH  <  e/2'  *  6  C- 


Let  /i6Dt  satisfy  \h{\)  -  Ej  CjUj(A)|  <  e/2  for  A  6  [a,6]fc.  Then 
l/W-M[yiW.---.J/*W]l  <  |/W-Z)ciwifei(:c)]l+ 

3 

I  51  W^a:)]  -  h(yi(x), ...,  yk(x)]\  <  e/2  +  e/2  =  e 
j 

for  Z  G  C. 

Finally,  (Hi)  — >  (i)  as  /  is  a  uniform  limit  of  continuous  functions  and 
therefore  continuous  itself. 

This  proof  has  demonstrated  a  general  structure  that  may  be  used  for 
approximation.  This  structure  is  shown  below  in  Figure  3. 

Part  (Hi)  of  the  theorem  shows  that  the  yj  's  are  simply  functions  which 
are  capable  of  approximating  linear  functionals  defined  on  the  space  X  (these 
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may  actually  be  linear  functional  themselves)  while  the  structure  for  h  is  simply 
a  continuous  memoryless  nonlinear  system  capable  of  approximating  uniformly 
on  compacta  in  lRk.  In  other  words,  the  problem  of  approximating  a  function 
whose  domain  may  be  any  compact  subset  of  any  normed  linear  space  has  been 
reduced  to  the  problem  of  approximating  a  function  on  lRk,  a  subject  about 
which  a  great  deal  is  known,  and  has  been  shown  to  some  extent  in  dealing 
with  the  Stone- Weierstrass  theorem.  Stiles,  Sandberg,  and  Ghosh  have  shown 
in  [22]  that  structures  of  a  similar  form  have  use  in  the  approximation  of  certain 
nonlinear  discrete  time  mappings  as  well. 

Part  (ii)  of  the  theorem  gives  a  specific  example  of  the  structure  of 
the  network.  Again  it  takes  the  y/s  to  be  uniform  approximations  of  linear 
functionals  on  X.  Here  one  possible  structure  for  h  is  shown  as  below  in  Figure 
4.  The  Wj's,  as  mentioned  before,  are  drawn  from  a  set  capable  of  uniform 
approximation  of  the  exponential  function  on  a  bounded  set  in  JR.  In  the 
simplest  case,  from  the  perspective  of  the  theorem,  each  Uj  may  be  taken  to  be 
the  function  exp(-). 

In  a  moment  we  will  determine  possible  choices  for  the  elements  Uj  in 
the  approximation  network.  Now  we  will  look  at  a  similar  method  of  dealing 
with  this  problem  given  in  [4],  [7],  and  [24].  We  start  be  defining  a  certain  class 
of  functions,  called  ridge  functions  and  then  immediately  give  the  theorem. 

Definition :  A  function  /  :  X  •->  M  is  called  a  ridge  function  if  it  may  be 
represented  in  the  form  /  =  g  o  0,  where  g  :  M^  IR  and  (f>  G  X*,  where  X*  is 
the  space  of  continuous  linear  functionals  on  X .  An  alternative  equivalent  form 
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Figure  4:  A  structure  for  h 

of  this  composite  function  is  fix)  =  g{4>{x))  for  x  €  X. 

It  can  easily  be  shown,  for  example,  that  all  ridge  functons  on  lRn  can 
be  written  in  the  form 

f(x)  =  g(aiCi  +  a2C,2  + (-  anCn) 

where  x  =  (Ci,  C2,  ■  ■  ■  ,Cn)  6  JRn ■ 

Theorem  15  (Cheney)  Let  G  be  a  fundamental  set  in  C(M)2  and  let  X  be  a 
normed  linear  space.  Let  $  be  a  subset  of  X*  such  that  the  set 

0/M|:0€$,<^O 

is  dense  in  the  unit  sphere  of  X* .  Then  the  set  of  ridge  functions  {g  °  (f>  '■  g  £ 
G,  (f)  e  <£*}  is  fundamental  in  C(X).3 


2  A  subset  Y  of  X  is  said  to  be  fundamental  in  X  if  its  linear  span  is  dense  in  X.  Thus, 

n 

there  are  elements  t/i, . . .  ,yn  €  Y  such  that  for  any  x  E  X  and  e  >  0,  |x  —  Yl  cjVj\  <  €  where 

j=i 

cj  e  .R 

3C(X)  is,  of  course,  the  set  of  continuous,  real-valued  functions  on  the  normed  linear  space 
X. 
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Proof:  Let  /  be  a  member  of  C(X),  C  a  compact  set  in  X,  and  e  >  0.  We 
have  shown  above  that  there  exist  Uj  G  C(M)  and  yj  G  X*  such  that 

\\f(x)-J2ujoyj\\<e/3 

3-i 

for  x  G  C.  By  adjusting  the  functions  Uj  as  necessary,  we  can  assume  that 
\\yj\\  =  1  for  1  <  j  <  77i.  Let  M  =  sup^c  \\x\\.  Choose  6  >  0  so  that  when 
|*|  <  M,  \t\  <  M,  and  \s  -  t\  <  6  we  get  \uj(s)  -  Uj(t)\  <  e/3P  for  1  <  j  <  P. 
This  is,  of  course,  possible  because  the  Uj  are  continuous.  Now  select  (f>j  G  <I>  so 
that  ||0i/||#i||— !fr||  <  S/M  for  1  <  j  <  P.  Let  Xj  =  l/\\<f>j\\  and  \i  =  max.,  \\(f>j\\. 
Select  djk  6  M  and  gjk  G  G  so  that  for  \T\  <  \iM  we  have 

AT 

MV)  -  E  %*#*WI  <  e/3P        (i  <  j  <  p). 

Now  let  xeC.  Then  ||x||  <  M,  \yj(x)\  <  M,  \Xj(f>j(x)\  <  M,  and 

\yj(x)  -  X^{x)\  <  Iklltellfc  -  A^-ll  <  M(*/JK)  =  & 
Prom  the  definition  of  5  (i.e.,  let  s  =  yj{x)  and  t  =  \j(j)j(x))  we  get 

l£^(ViW)  -  EM^iW)l  <  Z>/3P  =  e/3. 
j=i  i=i  i=i 

Now,  because  |0j(a;)|  <  ||0j||||z||  <  //M,  the  definition  of  aik  and  gjk  gives 

p  p    at  p 

|  £  h^XjMx))  -  E  E  <W;*(*; W)|  <  E  c/3P  =  eA 

.7=1  J=lfc=l  j=l 

Now,  by  a  simple  application  of  the  triangle  inequality,  we  get 

l/W-EE»(^))|  <  \f(x)-J2hj(yj(x))\ 
j=lfc=l  j=l 
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+  1  E  hfahix))  -  £  hj(Xj(x))\  +  I  £  hjiX&ix))  -EE  aiJ#i*(fcto)l  <  e- 
j=i  j=i  j=i  j=ifc=i 

P      AT  5 

Since  £  Y,  ajkgjk{<frj{x))  may  be  written  as  X)  cj9j{<t>j(x))i  we  Set  tne  desired 
j=ifc=i  j=i 

result: 

l/W  -  5Zci^(^iW)l  <  e  for  a;  €  C. 
i=i 

We  note  many  similarities  between  this  proof  and  part  of  Sandberg's. 

The  set  of  functions  G  in  Cheney's  theorem  is  similar  to  the  set  of  functions  U 

in  Sandberg's,  but  the  requirement  in  Sandberg's  theorem  on  U  is  less  stringent. 

The  set  U  is  required  only  to  approximate  one  specific  function  in  C(M),  namely 

the  exponential  function,  exp(/?),  on  a  certain  bounded  set.  Cheney's  theorem, 

on  the  other  hand,  requires  that  the  set  G  be  fundamental  in  C(IR).  This  means 

that  any  continuous  function  defined  on  a  compact  set  in  M  is  capable  of  being 

approximated  by  the  set  G. 
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6.  Approximation  and  Classification 

As  previously  mentioned,  the  problem  of  classifying  signals  plays  an  im- 
portant role  in  a  variety  of  problems.  We  attempt  to  provide  the  framework 
for  a  solution  to  some  of  these  problems  by  restating  the  problem  in  a  more 
mathematical  sense. 

We  assume  first  that  all  of  the  signals  to  be  classified  are  drawn  from 
a  normed  linear  space.  For  simplicity,  we  will  further  assume  that  each  signal 
may  belong  only  to  one  of  the  classes.  For  example,  assume  that  there  are  n 
different  classes  C\, . . . ,  Cn  that  are  all  subsets  of  a  normed  linear  space  X,  and 
that  each  signal  received  must  necessarily  belong  to  exactly  one  of  the  classes. 

We  now  have  the  framework  whereby  we  can  view  the  classifier  as  a 
mathematical  function  /  that  takes  the  signal  to  be  classified  as  input  and 
produces  the  desired  class  as  output.  For  example,  if  x  €E  Cj,  then  f(x)  =  a,j, 
where  ai,...,an  are  all  distinct  integers,  would  model  a  classification  system 
whereby  each  element  of  class  Cj  be  mapped  to  the  integer  a,j.  A  graph  of  this 
simple  function  is  shown  in  Figure  5.  Our  assumption  that  each  signal  may 
belong  to  only  one  class  means  that  the  sets  Cj  are  pairwise  disjoint. 

In  order  to  apply  the  theorems  that  we  have  developed,  it  is  helpful  to 
assume  that  the  sets  Cj  are  compact.  This  assumption  will,  of  course,  exclude 
certain  classification  problems  from  the  scope  of  these  theorems.  We  now  can 

n 

let  C  =  |J  Cj .  The  set  C  will  now  also  be  compact  as  it  is  the  union  of  a  finite 
number  of  compact  disjoint  sets.  Finally,  since  the  function  /  is  constant  on 
each  set  Cj  and  the  distance  between  any  pair  of  sets  is  positive,  the  function 
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Figure  5:  Representation  of  a  classifying  function 


/  is  continuous.  With  these  assumptions,  we  get  the  following: 

1.  There  are  real  numbers  c1?...,Cfc,  elements  yi,...,Vk   £   Y,  a  positive 
integer  n,  elements  U\ , . . . ,  un  of  U  and  e  >  0  such  that 


a,j-e<Yl  CjUj[yj{x)]  <aj  +  e 
for  x  €  Co  and  j  =  1, . . .  ,m. 


2.  There  are  a  positive  integer  k,  elements  3/1,. ..  ,3/*  of  Y  and  an  h  €  Dk 
such  that 

dj-e<  %i  (x), . . . ,  yfc(a;)]  <  %  +  e 

for  a:  G  Cj  and  j  =  1, . . .  ,m. 

These  follow  directly  from  Sandberg's  theorem. 
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Figure  6:  A  classifying  network 

This  now  allows  us  to  use  the  above  approximation  network  for  the  pur- 
pose of  classification.  We  require  one  additional  element  and  that  is  a  quantizer 
Q.  This  quantizer  is  simply  a  real  functional  Q  :  iR  i— »  M  such  that  Q  maps  num- 
bers in  the  interval  (clj  —  e,  flj+e)  to  a,j.  As  long  as  we  choose  e  <  0.5  min  |oj  —  dj\, 
then  this  quantizer,  when  following  a  network  of  the  structure  defined  above, 
will  allow  the  correct  class  to  be  output.  This  gives  an  entire  structure  for  a 
classification  network.  It  is  shown  in  Figure  6.  The  structure  for  h  as  defined 
in  part  (ii)  of  Sandberg's  theorem  is  used  in  the  figure. 

We  now  turn  to  demonstrating  some  acceptable  choices  for  the  hidden 
elements  in  our  classification  network.  In  all  cases,  the  complete  structure  of 
the  network  is  as  in  Figure  6.  No  assumption  is  made  about  the  number  n  (how 
many  elements  are  necessary)  or  the  determination  of  the  constants  Cj.  We  are 
concerned  entirely  with  determining  suitable  choices  for  the  Uj  and  give  several 
examples  as  well  as  a  justification  for  each  here.  In  each  case,  the  yj  will  be 
assumed  to  be  either  bounded  linear  functionals  on  X  or  elements  capable  of 
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uniformly  approximating  them. 

Polynomial  Networks 

A  polynomial  network  is  simply  one  in  which  each  Uj  is  a  polynomial. 
In  the  ridge  function  form,  a  polynomial  network  will  be  of  the  form 

5>*  °  <t>i  =  Y,HciAMx)Y- 

i  i       j 

The  original  Weierstrass  Approximation  theorem  showed  that  polynomi- 
als were  capable  of  approximating  on  JR.  Now,  either  Theorem  14  or  Theorem 
15  tells  us  that  polynomials,  when  placed  in  the  network,  are  capable  of  solving 
the  classification  problem. 

Exponential  Networks 

An  exponential  network  in  which  each  of  the  elements  Uj  is  of  the  form 
exp(-)  is  the  most  basic  to  justify  as  the  proof  of  Sandberg's  theorem  is  based 
on  showing  first  how  the  exponential  functional  is  capable  of  being  used  as 
the  nonlinear  element  and  then  showing  how  a  function  capable  of  uniformly 
approximating  it  on  a  bounded  interval  is  also  acceptable. 

Continuous  Sigmoidal  Networks 

A  more  complicated  but  extremely  important  type  of  network  that  is 
useful  for  classification  is  a  continuous  sigmoidal  network.  It  is  first  necessary 
to  define  a  sigmoidal  function. 

Definition  :  A  functional  a  :  M  i->  M  is  called  a  sigmoidal  function  or  sigmoid 
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if 


lim  a(t)  =  0  and   lim  alt)  =  1. 

t->-oc  t->oo 


In  1989,  Cybenko  (see  [8])  proved  that  for  any  compact  set  C  C  Mn,  any 
/  e  C(C),  and  for  any  e  >  0  there  exists  a  function  g  of  the  form 

m 

g(x)  =  J2 aM<  ljix>  +eo)         (x> Tj  ^  mn, 9j  e  M) 
where  a  is  a  continuous  sigmoidal  function  such  that 

\g(x)-  f(x)\  <  efor  all  x  G  C. 

In  other  words,  this  sum  of  translations  and  dilations  of  a  sigmoidal  func- 
tion is  capable  of  uniformly  approximating  any  bounded  continuous  functional 
on  a  compact  subset  of  Mn.  Sandberg  mentions  in  [20]  that  given  that  the 
statement  is  true  for  n  =  1,  the  (i)  —>  (ii)  section  of  his  proof  quickly  extends 
the  result  for  n  >  1.  Indeed,  if  we  let  X  be  simply  JRn,  the  elements  yj  be  linear 
functionals  defined  on  ]Rn,  and  Uj(x)  =  Cjo{a.jX  +  (3j)  where  Cj,  aJ5  fy  E  IR.  This 
gives  us  a  sum  of  the  type  desired  for  n  >  1. 

In  [5],  Cheney  demonstrates  as  a  result  of  the  general  theory  of  ridge 
functions  that  the  result  is  applicable  when  the  elements  of  the  vectors  jj  and 
the  numbers  6j  are  integers.  In  fact,  the  theorem  is  given  as  follows. 

Theorem  16  Let  g  be  a  continuous  function  on  IR  such  that  the  limits  of  g(i) 
as  t  — >  oo  and  t  — >  -co  exist  and  are  different.  Put  gy  =  g(jt  +  i).  Then 
{dij  '■  hj  €  ^}  is  fundamental  in  C(1R). 
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The  proof  of  this  theorem  relies  on  measure  theory,  making  use  of  the 
Riesz  Representation  Theorem  and  the  Dominated  Convergence  Theorem.  It  is 
beyond  the  scope  of  this  thesis  but  can  be  found  in  [4]. 

It  is  seen  that  this  theorem  allows  g  to  be  a  continuous  sigmoid,  but  does 
not  require  it.  The  only  importance  when  using  the  translations  and  dilations  is 
that  the  limits  at  oo  and  at  —  oo  are  not  the  same.  It  was  mentioned  earlier  that 
often  times  it  is  desired  that  the  output  of  the  activation  function  in  a  neural 
network  be  in  a  certain  range  such  as  [0,1].  Sigmoidal  functions  fit  nicely  into 
this  framework. 

Finally,  we  can  show  at  once  that  these  shifted  and  scaled  sigmoidal  func- 
tions are  capable  of  approximating  on  any  normed  linear  space  by  using  either 
of  the  two  main  theorems  after  noting  that  they  are  capable  of  approximating 
on  M. 

Squashing  Function  Networks 

The  previous  section  has  dealt  with  the  use  of  translations  and  dilations 
of  continuous  sigmoidal  functions.  In  this  section,  we  will  deal  with  certain  type 
of  sigmoid  that  is  not  necessarily  continuous,  a  squashing  function,  and  attempt 
to  obtain  a  similar  result.  A  squashing  function  is  defined  in  [12]  as  follows: 

Definition :   A  function  \P  :  M  i->  [0, 1]  is  a  squashing  function  if  it  is  nonde- 
creasing,  lim  \I>(A)  =  1,  and    lim    \I>(A)  =  0. 

A— >oo  A— ►— oo 

It  is  seen  at  once  that  this  definition  simply  requires  that  ^  be  a  nonde- 
creasing  sigmoidal  function  (not  necessarily  continuous).  Some  useful  squashing 
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functions  include  the  threshold  function,  #(A)  =  1{a>o}  where  1{.}  is  the  indi- 
cator function;  the  ramp  function,  *(A)  =  A1{0<a<i}  +  1{a>i>;  and  the  cosine 
squasher  (see  [10]),  tf(A)  =  (1  +  cos[A  +  37r/2])(l/2)l{_pi/2<A<7r/2}  +  1{a>tt/2}- 

Hornik  et  al.  first  define  what  they  call  a  sigma-pi  network  and  prove 
certain  results  pertaining  to  it.  Following  this,  they  extend  the  results  to  a 
network  resembling  those  that  have  been  mentioned  above.  We  proceed  as  did 
he,  considering  only  the  JR1  case. 

Definition  :  For  any  measurable  function  G  mapping  JR  to  JR,  let  X)  ^{G)  be 
the  class  of  functions 

q  lt 

{f-.JR^JR:  f{x)  =  Eft  II  G{Ajk(x)),     x, (3j  eM,AjkeA,q  =  l,2,...}. 

j=\        k=\ 

where  lj  €  JN  and  A  is  the  set  of  all  affine  functions  from  IR  to  M,  that  is,  the 
set  of  all  functions  of  the  form  A(x)  =  wx  +  b  where  w,  b  G  JR.  Networks  of 
this  form  are  referred  to  as  sigma-pi  networks. 

Definition:  For  any  measurable  function  G  mapping  Ft  to  JR,  let  £X(C?)  be 
the  class  of  functions 


{f-.JR^JR:  f{x)  =  J2PjG{Aj(x)):     x,fy  e  JR,Aj  6  A,q  =  1,2,. . .}. 


This  form  of  this  second  network  clearly  resembles  the  continuous  sig- 
moidal  network  that  was  shown  above  if  G  is  taken  to  be  a  continuous  sigmoidal 
function.  The  shifting  and  scaling  that  was  present  above  is  simply  performed 
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by  the  affine  functional  here;  only  the  notation  is  different.  For  now,  we  will 
continue  to  let  G  be  any  function. 

We  now  give  the  main  result  that  applies  here. 

Theorem  17  For  every  squashing  function  ^,  Y^W  is  uniformly  dense  on 
compacta  in  C(M). 

Proof:   We  proceed  by  first  proving  several  lemmas  that  will  aid  in  the  proof. 

Lemma  1:  Let  G  :  M  i-»  1R  be  continuous  and  nonconstant.  Then  J2Y\l(G)  is 
uniformly  dense  on  compacta  in  C{M). 

Proof:  We  can  apply  the  Stone- Weierstrass  Theorem  here.  Let  C  C  M  be 
any  compact  set.  For  any  G,  ^2Y\l(G)  is  obviously  an  algebra  on  C.  If  x, 
y  6  C,  x  ^  y,  then  we  can  find  an  Ax  E  A  such  that  G(Ai(x))  ^  G(A\(y)). 
To  show  this,  pick  a,  b  G  M,  a  ^  b  such  that  G(a)  /  G(b).  Then  choose  ^4i(-) 
to  satisfy  Ai(x)  =  a  and  Ax(y)  =  b.  Then  G{Ax{x))  ^  G{Ax{y)).  This  ensures 
that  Y,Y[{G)  is  separating.  Now  we  must  show  that  Y1X[1{G)  vanishes  on  no 
point  of  C.  Pick  b  €  M  such  that  G(b)  ^  0  and  A2{x)  =  0  •  x  +  b.  For  all 
x  G  C,  G(^2(a:))  =  G(6)  ^  0,  so  this  is  a  nonvanishing  constant  function.  The 
Stone- Weierstrass  theorem  now  guarantees  that  £  f]1  (G)  is  capable  of  uniformly 
approximating  any  continuous  functional  on  C. 

This  lemma  shows  that  the  sigma-pi  networks  are  capable  of  uniform 
approximation  of  any  continuous  function  on  a  compact  set  regardless  of  the  G 
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with  the  only  requirements  that  G  be  continuous  and  nonconstant.  We  have 
not  yet  required  that  G  be  a  squashing  function. 

Lemma  2:  Let  F  be  a  continuous  squashing  function  and  ^  be  an  arbitrary 
squashing  function.  For  every  e  >  0  there  is  an  element  He  of  Z)1^)  such  that 
sup|F(A)-tf€(A)|<6. 

Proof:  Choose  e  >  0  and  assume  without  loss  of  generality  that  e  <  1.  We 
must  now  find  constants  ft  and  affine  functions  Aj,  j ;  €  {1,  2, . . . ,  Q  —  1}  such 
that 

sup|F(A)-£fttf(^(A))|<e. 

AeZR  J  =  1 

Choose  Q  such  that  \/Q  <  e/2.  For  j  e  {1, 2, . . . ,  Q  -  1},  set  ft  =  1/Q.  Pick 
M  >  0  such  that  *(-M)  <  e/2Q  and  tf(M)  >  1  -  e/2Q.  Such  an  M  can 
be  found  because  ^  is  a  squashing  function.  For  j  €  {1,2,...,Q  —  1},  set 
Tj  =  sup{A  :  F(X)  =  j/Q}.  Set  rQ  =  sup{A  :  F(X)  =  1  -  1/2Q}.  Because  F  is  a 
continuous  squashing  function,  such  r/s  exist.  Now,  for  any  r  <  s,  let  ATjS  6  ^4 
be  the  unique  affine  function  satisfying  Ar,s(r)  =  M  and  ATiS(s)  =  —  M.   The 

Q-l 

desired  approximation  is  then  He  =  J2  ft^CA-  ,r  +i(^))-  We  can  easily  check 
that  on  the  intervals  (-c», rj,  (n,r2],. ..,  (rQ-i,rQ],  (rQ, oo),  |F(A)  -ff€(A)|  < 
e. 

Lemma  3:  For  every  squashing  function  ^,  every  e  >  0,  and  every  M  >  0  there 
is  a  function  cosM,e  £  X)1^)  such  that 

sup      |  cosm,€(A)  —  cos(A)|  <  e. 
\e[-M,M] 
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Proof:  Let  F  be  the  cosine  squasher  previously  defined.  By  adding,  subtract- 
ing, and  scaling  a  finite  number  of  affinely  shifted  versions  of  F,  we  can  get  the 
cosine  function  on  any  interval  [— M,  M).  Since  F  is  continuous,  we  may  apply 
Lemma  2  and  the  triangle  inequality  to  easily  obtain  the  result.  Indeed,  let  G 
be  an  element  of  X1^).  We  then  have  on  the  interval  [— M,  M], 

\G{\)  -  cos(X)\     <      \G(X)  -  F(X)\  +  \F(X)  -  cos(X)\ 
=     |G(A)-F(A)|  +  0 

<  e 

where  the  last  line  followed  from  Lemma  2. 

Q 
Lemma 4:  Let  g(-)  =  £  /3j  cos( A j (•)),  A j  6  A.  For  arbitrary  squashing  func- 

tion  \I>,  arbitrary  compact  C  C  1R,  and  for  arbitrary  e  >  0,  there  is  an  /  e  X1  (^) 

such  that  supieC  \g{x)  —  /(x)|  <  e. 

Proof:  Pick  M  >  0  such  that  for  j  e  {1,2,...,  Q},  A/(C)  C  [-M,M].  Be- 
cause Q  is  finite,  C  is  compact  and  the  A(-)  are  continuous,  such  an  M  can 

Q 
be  found.     Let  Q\   =  Q  •  J2  |/%|.    From  Lemma  3,  for  all  x  €  C  we  have 

Q 

|  £ /?,  cosM,e(^(x))  —  <7(x)|   <  e.     Because  cosm,£/q   £   X)1^))  we  see  that 

/(•)  =  £?=iC0SM,e/Q(A,(-))  €  EiW- 

Now  we  turn  to  proving  the  theorem.   By  Lemma  1,  the  trigonometric 

Q        h 
polynomials  {  £  Pj  fl  cos(^4jfc(-))  :  Q,lj  G  W,/?j  €  JR,Ajk  €  ^4}  are  uniformly 
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dense  on  compact  sets  in  C(M).  By  repeated  application  of  the  trigonometric 
identity  cos(a)  cos(6)  =  cos(a  +  6)  -cos(a-6),  we  may  write  every  trigonometric 

T 

polynomial  in  the  form  J2  atcos(At(-))  where  at  e  1R  and  At  6  A.  The  desired 

t=i 

result  now  follows  from  Lemma  4. 

This  now  gives  us  another  class  of  acceptable  functions  for  the  Uj  in 
Figure  6,  and  choosing  a  squashing  function  will  ensure  that  the  output  of  each 
Uj  is  always  between  0  and  1. 

Radial  Basis  Function  Networks 

An  important  type  of  function  that  may  be  used  in  some  classifying 
networks  is  the  radial  basis  function,  and  more  specifically,  the  Gaussian  basis 
function.  While  we  cannot  generalize  that  in  all  cases  a  basis  function  network 
may  be  used  for  uniform  approximation,  there  are  some  examples  that  are 
useful.  Information  about  the  universal  approximation  capability  of  radial  basis 
function  networks  may  be  found  in  [17].  We  define  a  radial  basis  function  as  a 
function  which  depends  only  on  the  norm  of  the  argument.  In  other  words,  if  / 
is  a  radial  basis  function  and  ||x||  =  \\y\\,  then  f(x)  =  f(y). 

We  now  give  an  example  of  a  case  when  uniform  approximation  is  pos- 
sible using  a  radial  basis  function  network.  In  this  particular  instance  the  basis 
functions  are  Gaussian,  functions  that  have  other  useful  properties  for  approx- 
imation networks.  Let  H  be  a  Hilbert  space  with  inner  product  <  •,  •  >  and 
norm  ||  •  ||  defined  in  the  usual  way.  We  are  interested  mainly  in  H  =  lRn 
with  ||x||  =  52jXj.  Let  C  C  H  be  compact  and  let  V  C  H  be  nonempty,  con- 
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vex,  and  satisfy  the  condition  that  for  x\,  Xi  E  C  with  x\  ^  x2  there  exists 
u  E  V  such  that  \\x\  -  u\\  ^  \\x2  -  u\\.  We  can,  for  example,  take  V  to  be  C 
as  long  as  C  is  convex,  or  we  can  take  V  to  be  any  nonempty  convex  subset 
of  H  containing  an  interior  point.  Let  P  be  a  nonempty  subset  of  (0,  oo)  or 
(—oo,0)  that  is  closed  under  addition.  Finally,  let  L  =  {g  :  C  i->  1R  :  g(x)  = 
£  aj-exp(— aj\\x  —  Vj\\2),m  <  oo,  a,  E  M,aj  €  P,Vj  G  V.  It  is  immediately 
seen  that  the  structure  of  L  is  of  the  form  needed  for  the  elements  Uj  in  Figure 
6.  With  these  assumptions  we  get  the  following  theorem. 

Theorem  18  Let  /  :  C  i-»  JR  be  continuous  and  let  e  >  0.  Then  there  exists  a 
g  E  L  such  that 

\f{a)-g{a)\  <e,aeC. 

Proof:  Using  the  property  above  and  the  convexity  of  V,  we  see  that  given  a\, 
a2  E  JR,  qji,  oli  €  P,  and  vi:  v2  E  V 

ax  exp(-a1||a:-  Vi\\2)a2  exp(-a2\\x  -  v2\\2)  =  6exp(-(o;1  +a2)||a;  -  ^||2) 

for  some  b  E  JR  and  w  E  V.  Also  we  can  see  that  a\  +  a2  E  P.  So  L  is  an 
algebra.  Choose  Xi  and  x2  in  C  and  assume  that  Xi  ^  x2.  Then  \\xi  —  v\\  ^ 
\\x2  —  v\\  for  some  v  E  V  by  our  first  assumption..  Therefore  exp(— a\\xi  —  v\\)  / 
exp(— a||x2  —  v ||)  so  L  separates  the  points  of  C.  Therefore,  by  the  Stone- 
Weierstrass  theorem,  the  proof  is  complete. 

Thus,  in  this  somewhat  less  general  compact  space,  the  Gaussian  basis 
functions  are  capable  of  uniformly  approximating  any  continuous  function  in 
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M.  They  therefore  may  be  used  as  the  elements  Uj  in  our  original  network. 
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7.  Applications 


Classifier  Example 


At  this  point  we  are  ready  to  give  an  example  of  an  actual  classification 
network  using  the  framework  that  we  have  provided.  This  example  will  also 
show  how  the  mathematical  formulations  that  we  have  been  making  relate  to 
the  problems  related  to  signal  classification  that  were  initially  discussed. 

Let  X  be  the  space  of  continuous  real-valued  functions  defined  on  [0,  l]n 
with  ||  •  ||  the  usual  sup  norm.  Let  k  and  r  be  positive  constants  and  let  Lip(/c) 
denote  the  subset  of  X  consisting  of  the  elements  of  X  that  satisfy  a  Lipschitz 
condition:  \x(a)  —  x(b)\  <  k\a  —  b\  for  all  a  and  b.  This  is  a  typical  way  to  deal 
with  a  good  class  of  nonlinear  functions.  Let  XX2,  ■  ■  ■ ,  xm  be  distinct  elements 
of  Lip(fc)  and  let  Cj  =  {x  E  Lip(/c):  \\x  —  Xj\\  <  r}  for  each  j  =  1, 2, . . . ,  m. 

Now  assume  that  r  <  (1/2) min^  ||j  —  Xj\\.  It  is  clear  that  the  Cj  are 
pairwise  disjoint  if  this  condition  is  satisfied.  Since  each  Cj  is  a  closed  bounded 
subset  of  X  that  is  equicontinuous  on  [0, 1]",  we  get  a  result  thanks  to  the 
Arzela-Ascoli  theorem  (see  [15])  showing  that  the  Cj  are  compact.  As  we  have 
shown  earlier,  since  the  Cj  are  compact  and  pairwise  disjoint,  the  union  Uj  Cj 
is  also  compact. 

We  now  introduce  a  theorem  in  [20]  without  the  proof  given  there. 

Theorem  19  Let  X  denote  the  normed  linear  space  of  iR-valued  continuous 
functions  on  X  :=  [0,  l]n,  with  the  usual  max  norm.  Let  g  E  X* ,  and  let  e  >  0. 
Then  there  are  points  al5 . . . ,  ap  6  X,  points  c1? . . . ,  Cp  €  M,  and  a,  q  €  X  such 
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that 

p 

sup|#(:r)  -J2cJx(ai)\  <  e 
xec  j=i 

and 

sup  \g(x)  —  /  q(a)x(a)da\  <  e. 
xec  Ji 

This  theorem  shows  that  a  classifier  can  be  found  in  this  case  using  a 
simple  sampling  and  summing  operation  or  an  integration.  It  applies  directly 
to  our  example  at  hand  since  we  are  working  on  [0,  l]n.  We  now  know  that  it 
is  possible  to  classify  the  signals  in  our  example  using  the  structure  in  Figure 
6  where  the  functional  yj  performs  the  sampling  and  summing  or  integration 
operation 

This  problem  is  very  applicable  to  the  examples  discussed  earlier.  If  n  = 
1,  2,  or  3,  we  are  classifying  continuous  signals  in  one,  two,  or  three  variables. 
This  is  the  kind  of  sensor  input  that  we  might  have  in  the  automatic  target 
identification  and  pattern  recognition  examples  that  were  mentioned  earlier. 

Conclusions 

We  have  described  a  specific  neural  network  structure  that  is  capable  of 
solving  certain  classification  problems.  This  structure  has  the  form  of  a  single 
hidden  layer  feedforward  neural  network  and  therefore  possesses  the  advantages 
of  neural  networks  that  were  mentioned  above.  It  has  a  simple  framework  that 
is  easily  built  in  hardware  or  simulated  in  software. 

It  is  important  to  note  that  there  are  limitations  to  the  methods  pre- 
sented here.    All  of  the  proofs  are  existence  proofs.    They  guarantee  that  a 
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solution  is  possible  and  in  some  cases  give  a  general  idea  on  how  it  might  be  ac- 
complished. For  example,  we  have  seen  how  certain  classes  of  functions  such  as 
sigmoids  and  polynomials  are  capable  of  being  used  as  the  activation  functions 
(the  Uj)  in  a  classifying  neural  network.  What  has  not  been  determined  is  the 
number  of  nodes  needed.  We  can  only  say  that  classification  is  possible  with  a 
finite  number  of  nodes.  Further,  we  have  not  given  a  certain  method  of  finding 
the  weights  Cj  in  Figure  6.  This  is  typically  what  we  referred  to  as  training  the 
neural  network. 

In  spite  of  these  shortcomings,  we  have  succeeded  in  providing  a  general 
framework  capable  of  studying  the  important  problem  of  signal  classification. 
We  have  accomplished  this  by  using  well-known  theorems  dealing  with  approx- 
imation. This  area  of  research  is  fairly  new  and  has  proven  extremely  useful  so 
far,  and  interest  in  it  will  continue  to  grow  in  the  future. 
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